Numpy (numerical arrays for numeric computation)

Numpy is the basic Python module for scientific computing in Python. Its most used object is the multidimensional array. These objects can have any number of dimensions with an efficient storage in the computer's RAM which makes data easy to handle and pass to other libraries. Furthermore, most ot numpy is implemented in C which makes it efficient and fast.

Multidimensional arrays

This is how numpy is usually imported and used to generate an numpy array


In [ ]:
import numpy as np

In [ ]:
data = [1, 10 , 2, 3, 8.0] # data is a list
a = np.array(data) # a is now a numpy array

In [ ]:
type(a)

In [ ]:
a

This gives the shape of the array


In [ ]:
a.shape

the number of dimensions


In [ ]:
a.ndim

the number of elements


In [ ]:
a.size

the number of bytes


In [ ]:
a.nbytes

The attribute dtype describes the element data type


In [ ]:
a.dtype

Creating new arrays

Arrays can be created with nested lists


In [ ]:
data = [[0.0, 2.0, 4.0, 6.0], [1.0, 3.0, 5.0, 7.0]]
b = np.array(data)

In [ ]:
b

In [ ]:
b.shape, b.ndim, b.size, b.nbytes

The function arange is similar to range but it creates an array and not a list


In [ ]:
c = np.arange(10) 
c

the function linspace allows for the creation of equally spaced points


In [ ]:
e = np.linspace(0.0, 10, 21) # 11 points
e

Similar to matlab, there are also functions like empty, zeros and ones.


In [ ]:
np.empty((4,4))

In [ ]:
np.zeros((3,3))

In [ ]:
np.ones((3,3))

dtype

dtype (for data type) is the attribute with the data type for each element. This data type is usually implicit but can be enforced at the moment of creating the array

For instance, this is implicitly defined as an integer dtype


In [ ]:
a = np.array([0, 1, 2, 3])

In [ ]:
a, a.dtype

But you could force the creation of a complex array


In [ ]:
b = np.zeros((2,2), dtype=np.complex64)
b

or a float array


In [ ]:
c = np.arange(0, 10, 2, dtype=np.float)
c

Operations over arrays

Mathematical operations can be performed over the whole array without running a for loop.

For instance


In [ ]:
a = np.linspace(0.0, 10.0, 5)
print('a =', a)

b = np.ones(5)
print('b =',b)

In [ ]:
a * 2 # every element in the array is multiplied by 2

In [ ]:
a + b   #addition works element by element. The same goes for every operation

Slicing

Slicing also works on arrays, only that this time it can be multidimensional


In [ ]:
a = np.random.rand(5, 5)#this creates a two dimensional array of random numbers

In [ ]:
print(a)

Each dimension has its own index


In [ ]:
print(a[0,0], a[0,1]) # first index corresponds to file, the second to columns

to extract the values of a whole column the following syntax can be used


In [ ]:
a[:,0] # this is the first column

The last row could be extracted as follows


In [ ]:
a[-1,:] #this is the last row

slicing also works in ranges


In [ ]:
a[0:2,0:3]

assignation also works with slicing


In [ ]:
a[0:2,0:3] = -4.0

In [ ]:
a

Exercise 1.1

Create an bidimensional array of random numbers with shape (4,8).

First, set the last column to -1 and then set the second row to 2

Boolean indexing

Arrays can be indexed using other boolean arrays.

For instance consider these two arrays with the age and gender of a set of 10 people


In [ ]:
age = np.array([23, 56, 67, 89, 23, 56, 27, 12, 2, 72])
gender= np.array(['m', 'o', 'f', 'f', 'm', 'f', 'm', 'o' ,'m', 'o'])

Suppose that we want to select only the gender of people marked as 'o' (other).

The following statement gives the new boolean array. Each element tells me whether the condition is True or False


In [ ]:
ii = (gender == 'o')
print(ii)

Now if we want to have the ages of the people with gender o all I have to do is:


In [ ]:
age[ii]

This logic can be extended to different conditions, for instance, let's select the items with age larger than 10 and smaller than 50


In [ ]:
ii = (age > 10) & (age < 50) # & is the symbol for the logical AND
print(age[ii])
print(gender[ii])

The following is also a valid syntax


In [ ]:
age[age>30]

Exercise 1.2

Using a=np.random.normal(size=1000) generate an array of 1000 thousand random numbers generated from a normal (i.e. gaussian) distribution with mean zero and standard deviation of one.

Print the number of elements with values larger than 2.0. Is this number close to what you expected from the properties of a gaussian distribution?

Universal functions

Universal functions (or ufuncs) are functions that take arrays as inputs and return either arrays or scalar. They are characterized for being fast (implemented in C) and allowing to write simpler python code without using for loops. Here is a list of all universal functions in numpy

For instance one could generate an array of values


In [ ]:
t = np.linspace(0.0, np.pi, 10)
print(t)

and the compute the values of the sin function


In [ ]:
print(np.sin(t))

Exercise 1.3

Using a=np.random.normal(size=1000) generate an array of 1000 thousand random numbers generated from a normal (i.e. gaussian) distribution with mean zero and standard deviation of one.

Then using only ufuncs on a generate a new array b that is -1 wherever a is negative and 1 wherever a is positive.


In [ ]: